Overview

Dataset statistics

 Dataset ADataset B
Number of variables1212
Number of observations446446
Missing cells434444
Missing cells (%)8.1%8.3%
Duplicate rows00
Duplicate rows (%)0.0%0.0%
Total size in memory45.3 KiB45.3 KiB
Average record size in memory104.0 B104.0 B

Variable types

 Dataset ADataset B
Numeric55
Categorical44
Text33

Alerts

Dataset ADataset B
Survived is highly overall correlated with SexSurvived is highly overall correlated with SexHigh Correlation
Sex is highly overall correlated with SurvivedSex is highly overall correlated with SurvivedHigh Correlation
Age has 86 (19.3%) missing values Age has 95 (21.3%) missing values Missing
Cabin has 346 (77.6%) missing values Cabin has 349 (78.3%) missing values Missing
PassengerId has unique values PassengerId has unique values Unique
Name has unique values Name has unique values Unique
SibSp has 317 (71.1%) zeros SibSp has 308 (69.1%) zeros Zeros
Parch has 343 (76.9%) zeros Parch has 346 (77.6%) zeros Zeros
Fare has 10 (2.2%) zeros Fare has 5 (1.1%) zeros Zeros

Reproduction

 Dataset ADataset B
Analysis started2023-08-01 08:59:41.7695832023-08-01 08:59:47.057435
Analysis finished2023-08-01 08:59:47.0558352023-08-01 08:59:52.116390
Duration5.29 seconds5.06 seconds
Software versionydata-profiling v0.0.dev0ydata-profiling v0.0.dev0
Download configurationconfig.jsonconfig.json

Variables

PassengerId
Real number (ℝ)

 Dataset ADataset B
Distinct446446
Distinct (%)100.0%100.0%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean446.98655444.23318
 Dataset ADataset B
Minimum22
Maximum890889
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-08-01T08:59:52.342919image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum22
5-th percentile47.543.5
Q1239.25229.5
median441.5448.5
Q3665.75662.75
95-th percentile847.75846.5
Maximum890889
Range888887
Interquartile range (IQR)426.5433.25

Descriptive statistics

 Dataset ADataset B
Standard deviation255.12225258.55853
Coefficient of variation (CV)0.570760460.58203335
Kurtosis-1.146847-1.2119741
Mean446.98655444.23318
Median Absolute Deviation (MAD)214216.5
Skewness0.004981964-0.014959797
Sum199356198128
Variance65087.36266852.512
MonotonicityNot monotonicNot monotonic
2023-08-01T08:59:52.666194image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
228 1
 
0.2%
99 1
 
0.2%
237 1
 
0.2%
218 1
 
0.2%
103 1
 
0.2%
719 1
 
0.2%
710 1
 
0.2%
329 1
 
0.2%
158 1
 
0.2%
701 1
 
0.2%
Other values (436) 436
97.8%
ValueCountFrequency (%)
619 1
 
0.2%
81 1
 
0.2%
87 1
 
0.2%
565 1
 
0.2%
588 1
 
0.2%
231 1
 
0.2%
177 1
 
0.2%
843 1
 
0.2%
832 1
 
0.2%
583 1
 
0.2%
Other values (436) 436
97.8%
ValueCountFrequency (%)
2 1
0.2%
4 1
0.2%
5 1
0.2%
6 1
0.2%
8 1
0.2%
10 1
0.2%
14 1
0.2%
18 1
0.2%
21 1
0.2%
23 1
0.2%
ValueCountFrequency (%)
2 1
0.2%
4 1
0.2%
5 1
0.2%
9 1
0.2%
11 1
0.2%
12 1
0.2%
13 1
0.2%
17 1
0.2%
20 1
0.2%
22 1
0.2%
ValueCountFrequency (%)
2 1
0.2%
4 1
0.2%
5 1
0.2%
9 1
0.2%
11 1
0.2%
12 1
0.2%
13 1
0.2%
17 1
0.2%
20 1
0.2%
22 1
0.2%
ValueCountFrequency (%)
2 1
0.2%
4 1
0.2%
5 1
0.2%
6 1
0.2%
8 1
0.2%
10 1
0.2%
14 1
0.2%
18 1
0.2%
21 1
0.2%
23 1
0.2%

Survived
Categorical

 Dataset ADataset B
Distinct22
Distinct (%)0.4%0.4%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
0
283 
1
163 
0
282 
1
164 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters22
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st row01
2nd row10
3rd row01
4th row00
5th row10

Common Values

ValueCountFrequency (%)
0 283
63.5%
1 163
36.5%
ValueCountFrequency (%)
0 282
63.2%
1 164
36.8%

Length

2023-08-01T08:59:52.902594image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2023-08-01T08:59:53.079916image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-08-01T08:59:53.240486image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 283
63.5%
1 163
36.5%
ValueCountFrequency (%)
0 282
63.2%
1 164
36.8%

Most occurring characters

ValueCountFrequency (%)
0 283
63.5%
1 163
36.5%
ValueCountFrequency (%)
0 282
63.2%
1 164
36.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 446
100.0%
ValueCountFrequency (%)
Decimal Number 446
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 283
63.5%
1 163
36.5%
ValueCountFrequency (%)
0 282
63.2%
1 164
36.8%

Most occurring scripts

ValueCountFrequency (%)
Common 446
100.0%
ValueCountFrequency (%)
Common 446
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 283
63.5%
1 163
36.5%
ValueCountFrequency (%)
0 282
63.2%
1 164
36.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 446
100.0%
ValueCountFrequency (%)
ASCII 446
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 283
63.5%
1 163
36.5%
ValueCountFrequency (%)
0 282
63.2%
1 164
36.8%

Pclass
Categorical

 Dataset ADataset B
Distinct33
Distinct (%)0.7%0.7%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
3
244 
1
101 
2
101 
3
244 
1
104 
2
98 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters33
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st row32
2nd row13
3rd row22
4th row33
5th row13

Common Values

ValueCountFrequency (%)
3 244
54.7%
1 101
22.6%
2 101
22.6%
ValueCountFrequency (%)
3 244
54.7%
1 104
23.3%
2 98
22.0%

Length

2023-08-01T08:59:53.423060image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2023-08-01T08:59:53.604793image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-08-01T08:59:53.783353image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
3 244
54.7%
1 101
22.6%
2 101
22.6%
ValueCountFrequency (%)
3 244
54.7%
1 104
23.3%
2 98
22.0%

Most occurring characters

ValueCountFrequency (%)
3 244
54.7%
1 101
22.6%
2 101
22.6%
ValueCountFrequency (%)
3 244
54.7%
1 104
23.3%
2 98
22.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 446
100.0%
ValueCountFrequency (%)
Decimal Number 446
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3 244
54.7%
1 101
22.6%
2 101
22.6%
ValueCountFrequency (%)
3 244
54.7%
1 104
23.3%
2 98
22.0%

Most occurring scripts

ValueCountFrequency (%)
Common 446
100.0%
ValueCountFrequency (%)
Common 446
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
3 244
54.7%
1 101
22.6%
2 101
22.6%
ValueCountFrequency (%)
3 244
54.7%
1 104
23.3%
2 98
22.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 446
100.0%
ValueCountFrequency (%)
ASCII 446
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3 244
54.7%
1 101
22.6%
2 101
22.6%
ValueCountFrequency (%)
3 244
54.7%
1 104
23.3%
2 98
22.0%

Name
['Text', 'Text']

 Dataset ADataset B
Distinct446446
Distinct (%)100.0%100.0%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-08-01T08:59:54.414325image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

 Dataset ADataset B
Max length6582
Median length4750
Mean length26.49775827.273543
Min length1212

Characters and Unicode

 Dataset ADataset B
Total characters1181812164
Distinct characters6059
Distinct categories77 ?
Distinct scripts22 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique446446 ?
Unique (%)100.0%100.0%

Sample

 Dataset ADataset B
1st rowLovell, Mr. John Hall ("Henry")Becker, Miss. Marion Louise
2nd rowMarechal, Mr. PierreMcEvoy, Mr. Michael
3rd rowMatthews, Mr. William JohnRichards, Master. William Rowe
4th rowVan Impe, Miss. CatharinaMurdlin, Mr. Joseph
5th rowSpedden, Mrs. Frederic Oakley (Margaretta Corning Stone)Osen, Mr. Olaf Elon
ValueCountFrequency (%)
mr 278
 
15.4%
miss 84
 
4.7%
mrs 54
 
3.0%
john 31
 
1.7%
william 29
 
1.6%
master 17
 
0.9%
henry 16
 
0.9%
james 14
 
0.8%
charles 11
 
0.6%
thomas 11
 
0.6%
Other values (892) 1259
69.8%
ValueCountFrequency (%)
mr 261
 
14.2%
miss 82
 
4.5%
mrs 75
 
4.1%
william 30
 
1.6%
john 26
 
1.4%
master 18
 
1.0%
henry 17
 
0.9%
anna 14
 
0.8%
mary 14
 
0.8%
george 13
 
0.7%
Other values (894) 1284
70.0%
2023-08-01T08:59:55.484845image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1359
 
11.5%
r 974
 
8.2%
e 819
 
6.9%
a 797
 
6.7%
n 687
 
5.8%
i 642
 
5.4%
s 614
 
5.2%
M 550
 
4.7%
l 509
 
4.3%
o 488
 
4.1%
Other values (50) 4379
37.1%
ValueCountFrequency (%)
1389
 
11.4%
r 995
 
8.2%
a 879
 
7.2%
e 868
 
7.1%
s 656
 
5.4%
n 651
 
5.4%
i 628
 
5.2%
M 577
 
4.7%
l 531
 
4.4%
o 509
 
4.2%
Other values (49) 4481
36.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 7550
63.9%
Uppercase Letter 1812
 
15.3%
Space Separator 1359
 
11.5%
Other Punctuation 958
 
8.1%
Close Punctuation 67
 
0.6%
Open Punctuation 67
 
0.6%
Dash Punctuation 5
 
< 0.1%
ValueCountFrequency (%)
Lowercase Letter 7814
64.2%
Uppercase Letter 1842
 
15.1%
Space Separator 1389
 
11.4%
Other Punctuation 947
 
7.8%
Open Punctuation 83
 
0.7%
Close Punctuation 83
 
0.7%
Dash Punctuation 6
 
< 0.1%

Most frequent character per category

Space Separator
ValueCountFrequency (%)
1359
100.0%
ValueCountFrequency (%)
1389
100.0%
Lowercase Letter
ValueCountFrequency (%)
r 974
12.9%
e 819
10.8%
a 797
10.6%
n 687
9.1%
i 642
8.5%
s 614
8.1%
l 509
 
6.7%
o 488
 
6.5%
t 324
 
4.3%
h 264
 
3.5%
Other values (16) 1432
19.0%
ValueCountFrequency (%)
r 995
12.7%
a 879
11.2%
e 868
11.1%
s 656
8.4%
n 651
8.3%
i 628
8.0%
l 531
 
6.8%
o 509
 
6.5%
t 336
 
4.3%
h 262
 
3.4%
Other values (16) 1499
19.2%
Uppercase Letter
ValueCountFrequency (%)
M 550
30.4%
A 140
 
7.7%
J 126
 
7.0%
H 98
 
5.4%
S 91
 
5.0%
C 88
 
4.9%
E 78
 
4.3%
L 73
 
4.0%
W 69
 
3.8%
B 68
 
3.8%
Other values (15) 431
23.8%
ValueCountFrequency (%)
M 577
31.3%
A 131
 
7.1%
J 111
 
6.0%
H 106
 
5.8%
S 94
 
5.1%
C 87
 
4.7%
E 83
 
4.5%
W 71
 
3.9%
B 66
 
3.6%
R 64
 
3.5%
Other values (15) 452
24.5%
Other Punctuation
ValueCountFrequency (%)
, 446
46.6%
. 446
46.6%
" 60
 
6.3%
' 5
 
0.5%
/ 1
 
0.1%
ValueCountFrequency (%)
. 446
47.1%
, 446
47.1%
" 50
 
5.3%
' 5
 
0.5%
Close Punctuation
ValueCountFrequency (%)
) 67
100.0%
ValueCountFrequency (%)
) 83
100.0%
Open Punctuation
ValueCountFrequency (%)
( 67
100.0%
ValueCountFrequency (%)
( 83
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 5
100.0%
ValueCountFrequency (%)
- 6
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 9362
79.2%
Common 2456
 
20.8%
ValueCountFrequency (%)
Latin 9656
79.4%
Common 2508
 
20.6%

Most frequent character per script

Common
ValueCountFrequency (%)
1359
55.3%
, 446
 
18.2%
. 446
 
18.2%
) 67
 
2.7%
( 67
 
2.7%
" 60
 
2.4%
- 5
 
0.2%
' 5
 
0.2%
/ 1
 
< 0.1%
ValueCountFrequency (%)
1389
55.4%
. 446
 
17.8%
, 446
 
17.8%
( 83
 
3.3%
) 83
 
3.3%
" 50
 
2.0%
- 6
 
0.2%
' 5
 
0.2%
Latin
ValueCountFrequency (%)
r 974
 
10.4%
e 819
 
8.7%
a 797
 
8.5%
n 687
 
7.3%
i 642
 
6.9%
s 614
 
6.6%
M 550
 
5.9%
l 509
 
5.4%
o 488
 
5.2%
t 324
 
3.5%
Other values (41) 2958
31.6%
ValueCountFrequency (%)
r 995
 
10.3%
a 879
 
9.1%
e 868
 
9.0%
s 656
 
6.8%
n 651
 
6.7%
i 628
 
6.5%
M 577
 
6.0%
l 531
 
5.5%
o 509
 
5.3%
t 336
 
3.5%
Other values (41) 3026
31.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 11818
100.0%
ValueCountFrequency (%)
ASCII 12164
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1359
 
11.5%
r 974
 
8.2%
e 819
 
6.9%
a 797
 
6.7%
n 687
 
5.8%
i 642
 
5.4%
s 614
 
5.2%
M 550
 
4.7%
l 509
 
4.3%
o 488
 
4.1%
Other values (50) 4379
37.1%
ValueCountFrequency (%)
1389
 
11.4%
r 995
 
8.2%
a 879
 
7.2%
e 868
 
7.1%
s 656
 
5.4%
n 651
 
5.4%
i 628
 
5.2%
M 577
 
4.7%
l 531
 
4.4%
o 509
 
4.2%
Other values (49) 4481
36.8%

Sex
Categorical

 Dataset ADataset B
Distinct22
Distinct (%)0.4%0.4%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
male
306 
female
140 
male
288 
female
158 

Length

 Dataset ADataset B
Max length66
Median length44
Mean length4.62780274.7085202
Min length44

Characters and Unicode

 Dataset ADataset B
Total characters20642100
Distinct characters55
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st rowmalefemale
2nd rowmalemale
3rd rowmalemale
4th rowfemalemale
5th rowfemalemale

Common Values

ValueCountFrequency (%)
male 306
68.6%
female 140
31.4%
ValueCountFrequency (%)
male 288
64.6%
female 158
35.4%

Length

2023-08-01T08:59:55.778304image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2023-08-01T08:59:55.976392image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-08-01T08:59:56.142280image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
male 306
68.6%
female 140
31.4%
ValueCountFrequency (%)
male 288
64.6%
female 158
35.4%

Most occurring characters

ValueCountFrequency (%)
e 586
28.4%
m 446
21.6%
a 446
21.6%
l 446
21.6%
f 140
 
6.8%
ValueCountFrequency (%)
e 604
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 158
 
7.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 2064
100.0%
ValueCountFrequency (%)
Lowercase Letter 2100
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 586
28.4%
m 446
21.6%
a 446
21.6%
l 446
21.6%
f 140
 
6.8%
ValueCountFrequency (%)
e 604
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 158
 
7.5%

Most occurring scripts

ValueCountFrequency (%)
Latin 2064
100.0%
ValueCountFrequency (%)
Latin 2100
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 586
28.4%
m 446
21.6%
a 446
21.6%
l 446
21.6%
f 140
 
6.8%
ValueCountFrequency (%)
e 604
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 158
 
7.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2064
100.0%
ValueCountFrequency (%)
ASCII 2100
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 586
28.4%
m 446
21.6%
a 446
21.6%
l 446
21.6%
f 140
 
6.8%
ValueCountFrequency (%)
e 604
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 158
 
7.5%

Age
Real number (ℝ)

 Dataset ADataset B
Distinct7574
Distinct (%)20.8%21.1%
Missing8695
Missing (%)19.3%21.3%
Infinite00
Infinite (%)0.0%0.0%
Mean29.50329.657407
 Dataset ADataset B
Minimum0.420.42
Maximum70.574
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-08-01T08:59:56.402476image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum0.420.42
5-th percentile45.5
Q12121
median2829
Q33736
95-th percentile57.0558
Maximum70.574
Range70.0873.58
Interquartile range (IQR)1615

Descriptive statistics

 Dataset ADataset B
Standard deviation14.38723614.099824
Coefficient of variation (CV)0.487653330.47542336
Kurtosis0.178611030.44446163
Mean29.50329.657407
Median Absolute Deviation (MAD)88
Skewness0.370785960.48797487
Sum10621.0810409.75
Variance206.99257198.80505
MonotonicityNot monotonicNot monotonic
2023-08-01T08:59:56.735102image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
28 15
 
3.4%
24 15
 
3.4%
30 13
 
2.9%
19 13
 
2.9%
21 13
 
2.9%
22 13
 
2.9%
25 13
 
2.9%
29 13
 
2.9%
26 11
 
2.5%
36 11
 
2.5%
Other values (65) 230
51.6%
(Missing) 86
 
19.3%
ValueCountFrequency (%)
30 17
 
3.8%
28 16
 
3.6%
18 16
 
3.6%
29 15
 
3.4%
21 13
 
2.9%
36 12
 
2.7%
19 12
 
2.7%
22 12
 
2.7%
35 12
 
2.7%
16 11
 
2.5%
Other values (64) 215
48.2%
(Missing) 95
21.3%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.83 2
 
0.4%
1 5
1.1%
2 5
1.1%
3 2
 
0.4%
4 6
1.3%
5 3
0.7%
6 2
 
0.4%
7 1
 
0.2%
8 1
 
0.2%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.75 1
 
0.2%
0.83 2
0.4%
0.92 1
 
0.2%
1 2
0.4%
2 3
0.7%
3 3
0.7%
4 4
0.9%
5 1
 
0.2%
6 1
 
0.2%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.75 1
 
0.2%
0.83 2
0.4%
0.92 1
 
0.2%
1 2
0.4%
2 3
0.7%
3 3
0.7%
4 4
0.9%
5 1
 
0.2%
6 1
 
0.2%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.83 2
 
0.4%
1 5
1.1%
2 5
1.1%
3 2
 
0.4%
4 6
1.3%
5 3
0.7%
6 2
 
0.4%
7 1
 
0.2%
8 1
 
0.2%

SibSp
Real number (ℝ)

 Dataset ADataset B
Distinct67
Distinct (%)1.3%1.6%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean0.459641260.50224215
 Dataset ADataset B
Minimum00
Maximum58
Zeros317308
Zeros (%)71.1%69.1%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-08-01T08:59:56.983135image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile00
Q100
median00
Q311
95-th percentile2.752
Maximum58
Range58
Interquartile range (IQR)11

Descriptive statistics

 Dataset ADataset B
Standard deviation0.927450951.133254
Coefficient of variation (CV)2.01777132.2563897
Kurtosis7.671590921.972078
Mean0.459641260.50224215
Median Absolute Deviation (MAD)00
Skewness2.68206744.1914151
Sum205224
Variance0.860165261.2842646
MonotonicityNot monotonicNot monotonic
2023-08-01T08:59:57.180403image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
0 317
71.1%
1 93
 
20.9%
2 13
 
2.9%
4 11
 
2.5%
3 9
 
2.0%
5 3
 
0.7%
ValueCountFrequency (%)
0 308
69.1%
1 107
 
24.0%
2 12
 
2.7%
3 6
 
1.3%
4 5
 
1.1%
8 5
 
1.1%
5 3
 
0.7%
ValueCountFrequency (%)
0 317
71.1%
1 93
 
20.9%
2 13
 
2.9%
3 9
 
2.0%
4 11
 
2.5%
5 3
 
0.7%
ValueCountFrequency (%)
0 308
69.1%
1 107
 
24.0%
2 12
 
2.7%
3 6
 
1.3%
4 5
 
1.1%
5 3
 
0.7%
8 5
 
1.1%
ValueCountFrequency (%)
0 308
69.1%
1 107
 
24.0%
2 12
 
2.7%
3 6
 
1.3%
4 5
 
1.1%
5 3
 
0.7%
8 5
 
1.1%
ValueCountFrequency (%)
0 317
71.1%
1 93
 
20.9%
2 13
 
2.9%
3 9
 
2.0%
4 11
 
2.5%
5 3
 
0.7%

Parch
Real number (ℝ)

 Dataset ADataset B
Distinct66
Distinct (%)1.3%1.3%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean0.36322870.36547085
 Dataset ADataset B
Minimum00
Maximum55
Zeros343346
Zeros (%)76.9%77.6%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-08-01T08:59:57.543808image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile00
Q100
median00
Q300
95-th percentile22
Maximum55
Range55
Interquartile range (IQR)00

Descriptive statistics

 Dataset ADataset B
Standard deviation0.763135090.79241949
Coefficient of variation (CV)2.10097692.1682153
Kurtosis8.03568058.8295011
Mean0.36322870.36547085
Median Absolute Deviation (MAD)00
Skewness2.54070682.6804687
Sum162163
Variance0.582375170.62792865
MonotonicityNot monotonicNot monotonic
2023-08-01T08:59:57.738073image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
0 343
76.9%
1 55
 
12.3%
2 43
 
9.6%
5 2
 
0.4%
4 2
 
0.4%
3 1
 
0.2%
ValueCountFrequency (%)
0 346
77.6%
1 52
 
11.7%
2 40
 
9.0%
3 4
 
0.9%
5 3
 
0.7%
4 1
 
0.2%
ValueCountFrequency (%)
0 343
76.9%
1 55
 
12.3%
2 43
 
9.6%
3 1
 
0.2%
4 2
 
0.4%
5 2
 
0.4%
ValueCountFrequency (%)
0 346
77.6%
1 52
 
11.7%
2 40
 
9.0%
3 4
 
0.9%
4 1
 
0.2%
5 3
 
0.7%
ValueCountFrequency (%)
0 346
77.6%
1 52
 
11.7%
2 40
 
9.0%
3 4
 
0.9%
4 1
 
0.2%
5 3
 
0.7%
ValueCountFrequency (%)
0 343
76.9%
1 55
 
12.3%
2 43
 
9.6%
3 1
 
0.2%
4 2
 
0.4%
5 2
 
0.4%

Ticket
['Text', 'Text']

 Dataset ADataset B
Distinct389383
Distinct (%)87.2%85.9%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-08-01T08:59:58.538313image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

 Dataset ADataset B
Max length1818
Median length1717
Mean length6.81614356.6569507
Min length33

Characters and Unicode

 Dataset ADataset B
Total characters30402969
Distinct characters3532
Distinct categories55 ?
Distinct scripts22 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique349332 ?
Unique (%)78.3%74.4%

Sample

 Dataset ADataset B
1st rowA/5 21173230136
2nd row1177436568
3rd row2822829106
4th row345773A./5. 3235
5th row169667534
ValueCountFrequency (%)
pc 30
 
5.3%
c.a 12
 
2.1%
2 8
 
1.4%
ston/o 8
 
1.4%
soton/o.q 6
 
1.1%
a/5 6
 
1.1%
soton/oq 6
 
1.1%
w./c 5
 
0.9%
347082 5
 
0.9%
347077 4
 
0.7%
Other values (407) 475
84.1%
ValueCountFrequency (%)
pc 25
 
4.5%
c.a 13
 
2.3%
ca 8
 
1.4%
w./c 7
 
1.3%
a/5 7
 
1.3%
2343 5
 
0.9%
soton/oq 5
 
0.9%
2 4
 
0.7%
ston/o 4
 
0.7%
347082 4
 
0.7%
Other values (404) 476
85.3%
2023-08-01T08:59:59.672304image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3 372
12.2%
1 341
11.2%
2 286
9.4%
7 251
 
8.3%
4 233
 
7.7%
6 214
 
7.0%
0 209
 
6.9%
5 190
 
6.2%
9 166
 
5.5%
8 148
 
4.9%
Other values (25) 630
20.7%
ValueCountFrequency (%)
3 367
12.4%
1 319
10.7%
2 313
10.5%
4 241
8.1%
7 229
 
7.7%
0 210
 
7.1%
6 210
 
7.1%
5 188
 
6.3%
9 169
 
5.7%
8 148
 
5.0%
Other values (22) 575
19.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 2410
79.3%
Uppercase Letter 348
 
11.4%
Other Punctuation 155
 
5.1%
Space Separator 119
 
3.9%
Lowercase Letter 8
 
0.3%
ValueCountFrequency (%)
Decimal Number 2394
80.6%
Uppercase Letter 305
 
10.3%
Other Punctuation 153
 
5.2%
Space Separator 112
 
3.8%
Lowercase Letter 5
 
0.2%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3 372
15.4%
1 341
14.1%
2 286
11.9%
7 251
10.4%
4 233
9.7%
6 214
8.9%
0 209
8.7%
5 190
7.9%
9 166
6.9%
8 148
 
6.1%
ValueCountFrequency (%)
3 367
15.3%
1 319
13.3%
2 313
13.1%
4 241
10.1%
7 229
9.6%
0 210
8.8%
6 210
8.8%
5 188
7.9%
9 169
7.1%
8 148
6.2%
Space Separator
ValueCountFrequency (%)
119
100.0%
ValueCountFrequency (%)
112
100.0%
Other Punctuation
ValueCountFrequency (%)
. 104
67.1%
/ 51
32.9%
ValueCountFrequency (%)
. 105
68.6%
/ 48
31.4%
Uppercase Letter
ValueCountFrequency (%)
C 74
21.3%
O 62
17.8%
P 47
13.5%
S 43
12.4%
A 34
9.8%
N 24
 
6.9%
T 21
 
6.0%
Q 12
 
3.4%
W 8
 
2.3%
I 6
 
1.7%
Other values (6) 17
 
4.9%
ValueCountFrequency (%)
C 72
23.6%
O 46
15.1%
P 45
14.8%
A 42
13.8%
S 34
11.1%
N 17
 
5.6%
T 16
 
5.2%
W 10
 
3.3%
Q 8
 
2.6%
I 4
 
1.3%
Other values (5) 11
 
3.6%
Lowercase Letter
ValueCountFrequency (%)
a 2
25.0%
s 2
25.0%
r 1
12.5%
i 1
12.5%
l 1
12.5%
e 1
12.5%
ValueCountFrequency (%)
a 2
40.0%
r 1
20.0%
i 1
20.0%
s 1
20.0%

Most occurring scripts

ValueCountFrequency (%)
Common 2684
88.3%
Latin 356
 
11.7%
ValueCountFrequency (%)
Common 2659
89.6%
Latin 310
 
10.4%

Most frequent character per script

Common
ValueCountFrequency (%)
3 372
13.9%
1 341
12.7%
2 286
10.7%
7 251
9.4%
4 233
8.7%
6 214
8.0%
0 209
7.8%
5 190
7.1%
9 166
6.2%
8 148
 
5.5%
Other values (3) 274
10.2%
ValueCountFrequency (%)
3 367
13.8%
1 319
12.0%
2 313
11.8%
4 241
9.1%
7 229
8.6%
0 210
7.9%
6 210
7.9%
5 188
7.1%
9 169
6.4%
8 148
5.6%
Other values (3) 265
10.0%
Latin
ValueCountFrequency (%)
C 74
20.8%
O 62
17.4%
P 47
13.2%
S 43
12.1%
A 34
9.6%
N 24
 
6.7%
T 21
 
5.9%
Q 12
 
3.4%
W 8
 
2.2%
I 6
 
1.7%
Other values (12) 25
 
7.0%
ValueCountFrequency (%)
C 72
23.2%
O 46
14.8%
P 45
14.5%
A 42
13.5%
S 34
11.0%
N 17
 
5.5%
T 16
 
5.2%
W 10
 
3.2%
Q 8
 
2.6%
I 4
 
1.3%
Other values (9) 16
 
5.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3040
100.0%
ValueCountFrequency (%)
ASCII 2969
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3 372
12.2%
1 341
11.2%
2 286
9.4%
7 251
 
8.3%
4 233
 
7.7%
6 214
 
7.0%
0 209
 
6.9%
5 190
 
6.2%
9 166
 
5.5%
8 148
 
4.9%
Other values (25) 630
20.7%
ValueCountFrequency (%)
3 367
12.4%
1 319
10.7%
2 313
10.5%
4 241
8.1%
7 229
 
7.7%
0 210
 
7.1%
6 210
 
7.1%
5 188
 
6.3%
9 169
 
5.7%
8 148
 
5.0%
Other values (22) 575
19.4%

Fare
Real number (ℝ)

 Dataset ADataset B
Distinct176172
Distinct (%)39.5%38.6%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean31.33595732.871944
 Dataset ADataset B
Minimum00
Maximum512.3292512.3292
Zeros105
Zeros (%)2.2%1.1%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-08-01T09:00:00.032498image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile7.057.2292
Q17.89587.95105
median1314.4583
Q33030.5
95-th percentile112.67708112.67708
Maximum512.3292512.3292
Range512.3292512.3292
Interquartile range (IQR)22.104222.54895

Descriptive statistics

 Dataset ADataset B
Standard deviation52.37457555.376185
Coefficient of variation (CV)1.6713891.6846033
Kurtosis35.70963238.663772
Mean31.33595732.871944
Median Absolute Deviation (MAD)5.77296.7083
Skewness5.11151045.383644
Sum13975.83714660.887
Variance2743.09623066.5219
MonotonicityNot monotonicNot monotonic
2023-08-01T09:00:00.368211image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
13 22
 
4.9%
7.8958 21
 
4.7%
8.05 21
 
4.7%
26 18
 
4.0%
7.75 15
 
3.4%
10.5 13
 
2.9%
26.55 11
 
2.5%
0 10
 
2.2%
7.925 8
 
1.8%
7.25 7
 
1.6%
Other values (166) 300
67.3%
ValueCountFrequency (%)
8.05 27
 
6.1%
13 22
 
4.9%
7.8958 21
 
4.7%
26 20
 
4.5%
7.75 17
 
3.8%
10.5 16
 
3.6%
26.55 10
 
2.2%
7.2292 7
 
1.6%
7.225 7
 
1.6%
7.925 6
 
1.3%
Other values (162) 293
65.7%
ValueCountFrequency (%)
0 10
2.2%
4.0125 1
 
0.2%
6.2375 1
 
0.2%
6.4375 1
 
0.2%
6.45 1
 
0.2%
6.4958 1
 
0.2%
6.75 1
 
0.2%
6.8583 1
 
0.2%
6.95 1
 
0.2%
6.975 1
 
0.2%
ValueCountFrequency (%)
0 5
1.1%
6.45 1
 
0.2%
6.4958 1
 
0.2%
6.75 2
 
0.4%
6.8583 1
 
0.2%
7.0458 1
 
0.2%
7.05 3
0.7%
7.0542 1
 
0.2%
7.225 7
1.6%
7.2292 7
1.6%
ValueCountFrequency (%)
0 5
1.1%
6.45 1
 
0.2%
6.4958 1
 
0.2%
6.75 2
 
0.4%
6.8583 1
 
0.2%
7.0458 1
 
0.2%
7.05 3
0.7%
7.0542 1
 
0.2%
7.225 7
1.6%
7.2292 7
1.6%
ValueCountFrequency (%)
0 10
2.2%
4.0125 1
 
0.2%
6.2375 1
 
0.2%
6.4375 1
 
0.2%
6.45 1
 
0.2%
6.4958 1
 
0.2%
6.75 1
 
0.2%
6.8583 1
 
0.2%
6.95 1
 
0.2%
6.975 1
 
0.2%

Cabin
['Text', 'Text']

 Dataset ADataset B
Distinct8581
Distinct (%)85.0%83.5%
Missing346349
Missing (%)77.6%78.3%
Memory size7.0 KiB7.0 KiB
2023-08-01T09:00:01.034611image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

 Dataset ADataset B
Max length1515
Median length33
Mean length3.733.5154639
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters373341
Distinct characters1919
Distinct categories33 ?
Distinct scripts22 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique7467 ?
Unique (%)74.0%69.1%

Sample

 Dataset ADataset B
1st rowC47F4
2nd rowE34D49
3rd rowA14B77
4th rowF2B51 B53 B55
5th rowE50C106
ValueCountFrequency (%)
c23 4
 
3.4%
c27 4
 
3.4%
c25 4
 
3.4%
d 3
 
2.5%
e101 3
 
2.5%
c52 2
 
1.7%
b98 2
 
1.7%
b96 2
 
1.7%
b28 2
 
1.7%
b20 2
 
1.7%
Other values (86) 91
76.5%
ValueCountFrequency (%)
c22 3
 
2.7%
g6 3
 
2.7%
c26 3
 
2.7%
f 3
 
2.7%
f33 2
 
1.8%
b77 2
 
1.8%
b20 2
 
1.8%
g73 2
 
1.8%
e33 2
 
1.8%
b18 2
 
1.8%
Other values (82) 89
78.8%
2023-08-01T09:00:01.963512image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
C 44
11.8%
2 42
11.3%
1 37
 
9.9%
5 26
 
7.0%
3 25
 
6.7%
B 25
 
6.7%
6 22
 
5.9%
19
 
5.1%
4 19
 
5.1%
0 18
 
4.8%
Other values (9) 96
25.7%
ValueCountFrequency (%)
C 33
 
9.7%
B 31
 
9.1%
1 30
 
8.8%
6 28
 
8.2%
2 28
 
8.2%
3 28
 
8.2%
7 21
 
6.2%
5 19
 
5.6%
8 19
 
5.6%
D 18
 
5.3%
Other values (9) 86
25.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 235
63.0%
Uppercase Letter 119
31.9%
Space Separator 19
 
5.1%
ValueCountFrequency (%)
Decimal Number 212
62.2%
Uppercase Letter 113
33.1%
Space Separator 16
 
4.7%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
C 44
37.0%
B 25
21.0%
E 17
 
14.3%
D 15
 
12.6%
A 8
 
6.7%
F 6
 
5.0%
G 3
 
2.5%
T 1
 
0.8%
ValueCountFrequency (%)
C 33
29.2%
B 31
27.4%
D 18
15.9%
E 12
 
10.6%
F 8
 
7.1%
G 6
 
5.3%
A 4
 
3.5%
T 1
 
0.9%
Decimal Number
ValueCountFrequency (%)
2 42
17.9%
1 37
15.7%
5 26
11.1%
3 25
10.6%
6 22
9.4%
4 19
8.1%
0 18
7.7%
7 17
7.2%
8 16
 
6.8%
9 13
 
5.5%
ValueCountFrequency (%)
1 30
14.2%
6 28
13.2%
2 28
13.2%
3 28
13.2%
7 21
9.9%
5 19
9.0%
8 19
9.0%
0 16
7.5%
4 12
 
5.7%
9 11
 
5.2%
Space Separator
ValueCountFrequency (%)
19
100.0%
ValueCountFrequency (%)
16
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 254
68.1%
Latin 119
31.9%
ValueCountFrequency (%)
Common 228
66.9%
Latin 113
33.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
C 44
37.0%
B 25
21.0%
E 17
 
14.3%
D 15
 
12.6%
A 8
 
6.7%
F 6
 
5.0%
G 3
 
2.5%
T 1
 
0.8%
ValueCountFrequency (%)
C 33
29.2%
B 31
27.4%
D 18
15.9%
E 12
 
10.6%
F 8
 
7.1%
G 6
 
5.3%
A 4
 
3.5%
T 1
 
0.9%
Common
ValueCountFrequency (%)
2 42
16.5%
1 37
14.6%
5 26
10.2%
3 25
9.8%
6 22
8.7%
19
7.5%
4 19
7.5%
0 18
7.1%
7 17
6.7%
8 16
 
6.3%
ValueCountFrequency (%)
1 30
13.2%
6 28
12.3%
2 28
12.3%
3 28
12.3%
7 21
9.2%
5 19
8.3%
8 19
8.3%
16
7.0%
0 16
7.0%
4 12
 
5.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 373
100.0%
ValueCountFrequency (%)
ASCII 341
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
C 44
11.8%
2 42
11.3%
1 37
 
9.9%
5 26
 
7.0%
3 25
 
6.7%
B 25
 
6.7%
6 22
 
5.9%
19
 
5.1%
4 19
 
5.1%
0 18
 
4.8%
Other values (9) 96
25.7%
ValueCountFrequency (%)
C 33
 
9.7%
B 31
 
9.1%
1 30
 
8.8%
6 28
 
8.2%
2 28
 
8.2%
3 28
 
8.2%
7 21
 
6.2%
5 19
 
5.6%
8 19
 
5.6%
D 18
 
5.3%
Other values (9) 86
25.2%

Embarked
Categorical

 Dataset ADataset B
Distinct33
Distinct (%)0.7%0.7%
Missing20
Missing (%)0.4%0.0%
Memory size7.0 KiB7.0 KiB
S
329 
C
80 
Q
35 
S
326 
C
80 
Q
40 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters444446
Distinct characters33
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st rowSS
2nd rowCQ
3rd rowSS
4th rowSS
5th rowCS

Common Values

ValueCountFrequency (%)
S 329
73.8%
C 80
 
17.9%
Q 35
 
7.8%
(Missing) 2
 
0.4%
ValueCountFrequency (%)
S 326
73.1%
C 80
 
17.9%
Q 40
 
9.0%

Length

2023-08-01T09:00:02.228165image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2023-08-01T09:00:02.408839image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-08-01T09:00:02.581702image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
s 329
74.1%
c 80
 
18.0%
q 35
 
7.9%
ValueCountFrequency (%)
s 326
73.1%
c 80
 
17.9%
q 40
 
9.0%

Most occurring characters

ValueCountFrequency (%)
S 329
74.1%
C 80
 
18.0%
Q 35
 
7.9%
ValueCountFrequency (%)
S 326
73.1%
C 80
 
17.9%
Q 40
 
9.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 444
100.0%
ValueCountFrequency (%)
Uppercase Letter 446
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S 329
74.1%
C 80
 
18.0%
Q 35
 
7.9%
ValueCountFrequency (%)
S 326
73.1%
C 80
 
17.9%
Q 40
 
9.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 444
100.0%
ValueCountFrequency (%)
Latin 446
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
S 329
74.1%
C 80
 
18.0%
Q 35
 
7.9%
ValueCountFrequency (%)
S 326
73.1%
C 80
 
17.9%
Q 40
 
9.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 444
100.0%
ValueCountFrequency (%)
ASCII 446
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
S 329
74.1%
C 80
 
18.0%
Q 35
 
7.9%
ValueCountFrequency (%)
S 326
73.1%
C 80
 
17.9%
Q 40
 
9.0%

Interactions

Dataset A

2023-08-01T08:59:45.677262image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-08-01T08:59:50.546400image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset A

2023-08-01T08:59:42.589684image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-08-01T08:59:47.496526image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset A

2023-08-01T08:59:43.361933image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-08-01T08:59:48.238998image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset A

2023-08-01T08:59:44.120825image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-08-01T08:59:48.994524image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset A

2023-08-01T08:59:44.837558image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-08-01T08:59:49.789525image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset A

2023-08-01T08:59:45.815394image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-08-01T08:59:50.687571image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset A

2023-08-01T08:59:42.738709image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-08-01T08:59:47.640462image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset A

2023-08-01T08:59:43.508998image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-08-01T08:59:48.385010image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset A

2023-08-01T08:59:44.259136image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-08-01T08:59:49.143501image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset A

2023-08-01T08:59:44.976053image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-08-01T08:59:49.932113image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset A

2023-08-01T08:59:45.965791image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-08-01T08:59:50.843863image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset A

2023-08-01T08:59:42.903106image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-08-01T08:59:47.792515image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset A

2023-08-01T08:59:43.669142image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-08-01T08:59:48.542791image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset A

2023-08-01T08:59:44.411949image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-08-01T08:59:49.299908image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset A

2023-08-01T08:59:45.124049image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-08-01T08:59:50.090193image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset A

2023-08-01T08:59:46.108991image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-08-01T08:59:51.008695image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset A

2023-08-01T08:59:43.062935image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-08-01T08:59:47.954310image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset A

2023-08-01T08:59:43.820272image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-08-01T08:59:48.694300image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset A

2023-08-01T08:59:44.555206image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-08-01T08:59:49.471421image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset A

2023-08-01T08:59:45.400914image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-08-01T08:59:50.253030image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset A

2023-08-01T08:59:46.256696image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-08-01T08:59:51.156886image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset A

2023-08-01T08:59:43.215172image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-08-01T08:59:48.096752image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset A

2023-08-01T08:59:43.971043image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-08-01T08:59:48.844029image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset A

2023-08-01T08:59:44.695964image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-08-01T08:59:49.629668image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset A

2023-08-01T08:59:45.534764image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-08-01T08:59:50.397573image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

Dataset A

2023-08-01T09:00:02.724836image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-08-01T09:00:02.946660image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset A

PassengerIdAgeSibSpParchFareSurvivedPclassSexEmbarked
PassengerId1.000-0.021-0.0610.005-0.0020.0790.0160.0470.000
Age-0.0211.000-0.169-0.2380.1070.1780.2680.1570.000
SibSp-0.061-0.1691.0000.4920.4420.1260.1380.2560.049
Parch0.005-0.2380.4921.0000.4260.1460.0000.3120.000
Fare-0.0020.1070.4420.4261.0000.2650.4600.2010.200
Survived0.0790.1780.1260.1460.2651.0000.2910.5540.183
Pclass0.0160.2680.1380.0000.4600.2911.0000.1020.289
Sex0.0470.1570.2560.3120.2010.5540.1021.0000.092
Embarked0.0000.0000.0490.0000.2000.1830.2890.0921.000

Dataset B

PassengerIdAgeSibSpParchFareSurvivedPclassSexEmbarked
PassengerId1.0000.021-0.0520.0480.0460.0670.1010.0950.000
Age0.0211.000-0.132-0.2150.0540.1550.2160.0210.000
SibSp-0.052-0.1321.0000.4240.4640.1700.1220.2300.056
Parch0.048-0.2150.4241.0000.4270.1850.0000.2520.000
Fare0.0460.0540.4640.4271.0000.2960.4890.1950.206
Survived0.0670.1550.1700.1850.2961.0000.3530.5470.123
Pclass0.1010.2160.1220.0000.4890.3531.0000.0920.256
Sex0.0950.0210.2300.2520.1950.5470.0921.0000.113
Embarked0.0000.0000.0560.0000.2060.1230.2560.1131.000

Missing values

Dataset A

2023-08-01T08:59:46.471669image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.

Dataset B

2023-08-01T08:59:51.379237image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.

Dataset A

2023-08-01T08:59:46.761559image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Dataset B

2023-08-01T08:59:51.673408image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Dataset A

2023-08-01T08:59:46.967301image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Dataset B

2023-08-01T08:59:52.037731image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
22722803Lovell, Mr. John Hall ("Henry")male20.500A/5 211737.2500NaNS
83984011Marechal, Mr. PierremaleNaN001177429.7000C47C
41841902Matthews, Mr. William Johnmale30.0002822813.0000NaNS
41942003Van Impe, Miss. Catharinafemale10.00234577324.1500NaNS
31932011Spedden, Mrs. Frederic Oakley (Margaretta Corning Stone)female40.01116966134.5000E34C
17918003Leonard, Mr. Lionelmale36.000LINE0.0000NaNS
10510603Mionoff, Mr. Stoytchomale28.0003492077.8958NaNS
43944002Kvillner, Mr. Johan Henrik Johannessonmale31.000C.A. 1872310.5000NaNS
47547601Clifford, Mr. George QuincymaleNaN0011046552.0000A14S
22122202Bracken, Mr. James Hmale27.00022036713.0000NaNS

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
61861912Becker, Miss. Marion Louisefemale4.02123013639.0000F4S
71871903McEvoy, Mr. MichaelmaleNaN003656815.5000NaNQ
40740812Richards, Master. William Rowemale3.0112910618.7500NaNS
58959003Murdlin, Mr. JosephmaleNaN00A./5. 32358.0500NaNS
13813903Osen, Mr. Olaf Elonmale16.00075349.2167NaNS
323313Glynn, Miss. Mary AgathafemaleNaN003356777.7500NaNQ
36736813Moussa, Mrs. (Mantoura Boulos)femaleNaN0026267.2292NaNC
33833913Dahl, Mr. Karl Edwartmale45.00075988.0500NaNS
56356403Simmons, Mr. JohnmaleNaN00SOTON/OQ 3920828.0500NaNS
76776803Mangan, Miss. Maryfemale30.5003648507.7500NaNQ

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
50951013Lang, Mr. Fangmale26.000160156.4958NaNS
61361403Horgan, Mr. JohnmaleNaN003703777.7500NaNQ
59059103Rintamaki, Mr. Mattimale35.000STON/O 2. 31012737.1250NaNS
58858903Gilinski, Mr. Eliezermale22.000149738.0500NaNS
28929013Connolly, Miss. Katefemale22.0003703737.7500NaNQ
21121212Cameron, Miss. Clear Anniefemale35.000F.C.C. 1352821.0000NaNS
60660703Karaic, Mr. Milanmale30.0003492467.8958NaNS
52652712Ridsdale, Miss. Lucyfemale50.000W./C. 1425810.5000NaNS
83083113Yasbeck, Mrs. Antoni (Selini Alexander)female15.010265914.4542NaNC
46546603Goncalves, Mr. Manuel Estanslasmale38.000SOTON/O.Q. 31013067.0500NaNS

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
464703Lennon, Mr. DenismaleNaN1037037115.5000NaNQ
19619703Mernagh, Mr. RobertmaleNaN003687037.7500NaNQ
66266301Colley, Mr. Edward Pomeroymale47.0000572725.5875E58S
101113Sandstrom, Miss. Marguerite Rutfemale4.0011PP 954916.7000G6S
68368403Goodwin, Mr. Charles Edwardmale14.0052CA 214446.9000NaNS
34234302Collander, Mr. Erik Gustafmale28.000024874013.0000NaNS
84684703Sage, Mr. Douglas BullenmaleNaN82CA. 234369.5500NaNS
27928013Abbott, Mrs. Stanton (Rosa Hunt)female35.0011C.A. 267320.2500NaNS
80380413Thomas, Master. Assad Alexandermale0.420126258.5167NaNC
85085103Andersson, Master. Sigvard Harald Eliasmale4.004234708231.2750NaNS

Duplicate rows

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked# duplicates
Dataset does not contain duplicate rows.

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked# duplicates
Dataset does not contain duplicate rows.